language en

LLMD : Large Langage Model Design Ontology

This version:: 0.1
Latest version:: https://edrohal.com/llmd
Issued on:: Date issued
Download serialization:
License:

Ontology Specification Draft

Abstract

This ontology represents a fragment of the knowledge in the field of Large Language Models (LLMs), focusing on architecture. It provides a framework for comparing LLM designs.

Introduction back to ToC

This ontology offers a simplified structure to understand key architectural variations across LLMs.

LLMD: Overview back to ToC

This ontology has the following classes and properties.

LLMD: Description back to ToC

We put our REPORT here

Cross-reference for LLMD classes, object properties and data properties back to ToC

This section provides details for each class and property defined by LLMD.

Classes

Architecture
Attention Layer
Boolean
Causal Attention Layer
Corporation
Cross Attention Layer
Data Type
Deep Learning Model
Embedding Layer
Image
Language Model
Language Processing Task
Language Processing Training Task
Language Seq2 Seq Task
Large Language Model
Machine Learning Model
Mamba
Model
Module
Multi Layer Perceptron
Normalization Layer
Position Embedding Layer
Research Organisation
S4
Self Attention Layer
Single Layer Perceptron
Speech
Supervised Training Task
Task
Text
Token Embedding Layer
Tokenizer
Training Task
Transformer
Transformer Block
Transformer Decoder Block
Transformer Decoder Only
Transformer Encoder Block
Transformer Encoder Decoder
Transformer Encoder Only
Unsupervised Training Task

Architecture^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Architecture

An architecture is the skeleton of a model. It uses some modules, which can in turn use other modules in a nested structure.

has super-classes
has sub-classes: S4 ^c, Transformer ^c
is in domain of: uses Module ^op
is in range of: has Architecture ^op

Attention Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#AttentionLayer

A layer implementing attention between query, key and value representations.

has super-classes: Module ^c
has sub-classes: Causal Attention Layer ^c, Cross Attention Layer ^c, Self Attention Layer ^c
is in domain of: uses Causal Mask ^dp
is disjoint with: Embedding Layer ^c, Multi Layer Perceptron ^c, Normalization Layer ^c, Transformer Block ^c

Boolean^c back to ToC or Class ToC

IRI: http://schema.org/Boolean

has super-classes: Data Type ^c

Causal Attention Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#CausalAttentionLayer

A specific type of attention layer where tokens can not attend to tokens coming after them in a sequence.

is equivalent to: Attention Layer ^c and (uses Causal Mask ^dp value true)
has super-classes: Attention Layer ^c
has members: BLOOM Decoder Block Causal Attention Layer ⁿⁱ, GPT Decoder Causal Attention ⁿⁱ, T5 Decoder Causal Attention Layer ⁿⁱ

Corporation^c back to ToC or Class ToC

IRI: http://schema.org/Corporation

has super-classes: Organisation ^c
has members: Google ⁿⁱ, Hugging Face ⁿⁱ, Open A I ⁿⁱ

Cross Attention Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#CrossAttentionLayer

An attention layer that performs attention between two different sequences (usually, the encoded source sequence and the generated target sequence)

has super-classes: Attention Layer ^c
has members: T5 Decoder Cross-Attention Layer ⁿⁱ

Data Type^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#DataType

A class representing the possible types of data that can be inputs to a model, or more generally an algorithm.

has sub-classes: Boolean ^c, Image ^c, Speech ^c, Text ^c
is in range of: has Input Type ^op, has Output Type ^op

Deep Learning Model^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#DeepLearningModel

A deep learning model uses some modules (equivalently, it has an architecture), and adjusts some parameters through a training task.

is equivalent to: (has Architecture ^op some Architecture ^c) and (has Training Task ^op some Training Task ^c) and (has Parameters ^dp some )
has super-classes: Machine Learning Model ^c
has sub-classes: Large Language Model ^c
is in domain of: has Architecture ^op

Embedding Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#EmbeddingLayer

A layer that implements a lookup table, associating values from a finite domain (like a vocabulary) to embeddings.

has super-classes: Module ^c
has sub-classes: Position Embedding Layer ^c, Token Embedding Layer ^c
is in domain of: is Transpose Layer ^op
is in range of: is Transpose Layer ^op
has members: T5 Embedding Layer ⁿⁱ
is disjoint with: Attention Layer ^c, Normalization Layer ^c, Transformer Block ^c

Image^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Image

has super-classes: Data Type ^c

Language Model^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageModel

A model that performs some task related to language. It can be multimodal.

is equivalent to: performs Task ^op some Language Processing Task ^c
has super-classes: Model ^c
is in domain of: uses Tokenizer ^op

Language Processing Task^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageProcessingTask

A language processing task is a task related to language in the form of text or speech. It can either have text or speech in its outputs or text or speech in its inputs.

is equivalent to: (has Input Type ^op value Speech ⁿⁱ) or (has Input Type ^op value Text ⁿⁱ) or (has Output Type ^op value Speech ⁿⁱ) or (has Output Type ^op value Text ⁿⁱ)
has super-classes: Task ^c
has sub-classes: Language Seq2 Seq Task ^c

Language Processing Training Task^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageProcessingTrainingTask

A training task that is also a language processing task.

has super-classes
has members: Masked Language Modeling ⁿⁱ, Next Word Prediction ⁿⁱ

Language Seq2 Seq Task^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageSeq2SeqTask

A language sequence to sequence task relates some language input sequence to some language output sequence.

is equivalent to: (has Input Type ^op exactly 1 ) and (has Output Type ^op exactly 1 )
has super-classes: Language Processing Task ^c
has members: Text Summarization ⁿⁱ, Text Translation ⁿⁱ

Large Language Model^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LargeLanguageModel

A language model that is considered large (i.e it has more than 100000000 parameters)

is equivalent to: Deep Learning Model ^c and Language Model ^c and (has Parameters ^dp some )
has super-classes: Deep Learning Model ^c
has members: BLOOM Model ⁿⁱ, GPT2 Model ⁿⁱ, T5.11b.model ⁿⁱ

Machine Learning Model^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#MachineLearningModel

A model that can be used to do predictions after having been fitted to some data. Examples : Linear models, support-vector machines, large language models...

has super-classes: Model ^c
has sub-classes: Deep Learning Model ^c
is in domain of: has Training Task ^op

Mamba^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Mamba

A S4 architecture that uses the selection mecanism to parametrize the SSM parameters through linear projection of the input.

has super-classes: S4 ^c

Model^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Model

has super-classes: Algorithm ^c
has sub-classes: Language Model ^c, Machine Learning Model ^c
is in domain of: Published In ^dp, has Parameters ^dp, performs Task ^op, published By ^op

Module^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Module

Any (parametrisable or not) block that can be put together with other modules to create another module or an architecture.

has sub-classes: Attention Layer ^c, Embedding Layer ^c, Multi Layer Perceptron ^c, Normalization Layer ^c, Transformer Block ^c
is in domain of: uses Module ^op
is in range of: uses Module ^op

Multi Layer Perceptron^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#MultiLayerPerceptron

A module consisting of the association of several linear layers together with activation functions (which can be the identity function).

is equivalent to: Single Layer Perceptron ^c or ((uses Module ^op only Single Layer Perceptron ^c) and (uses Module ^op min 1 Single Layer Perceptron ^c))
has super-classes: Module ^c
has sub-classes: Single Layer Perceptron ^c
has members: Bert Encoder Block MLP ⁿⁱ, GPT Decoder MLP ⁿⁱ, T5 Decoder MLP ⁿⁱ
is disjoint with: Attention Layer ^c, Normalization Layer ^c, Transformer Block ^c

Normalization Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#NormalizationLayer

A module that performs renormalization of the input with respect to some measure.

has super-classes: Module ^c
has members: BERT Encoder Normalization Layer ⁿⁱ, Bloom Embedding Layer Normalization ⁿⁱ, GPT Decoder Normalization Layer ⁿⁱ, T5 Decoder Normalization Layer ⁿⁱ
is disjoint with: Embedding Layer ^c, Attention Layer ^c, Multi Layer Perceptron ^c, Transformer Block ^c

Position Embedding Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#PositionEmbeddingLayer

An embedding that depends only on the position of a certain token in a sequence.

has super-classes: Embedding Layer ^c
has members: BLOOM Decoder Block ALIBI LAYER ⁿⁱ, GPT2 Absolute Position Embedding Layer ⁿⁱ, T5 Relative Position Embedding ⁿⁱ

Research Organisation^c back to ToC or Class ToC

IRI: http://schema.org/ResearchOrganisation

has super-classes: Organisation ^c
is in domain of: funded By ^op
has members: Google AI ⁿⁱ

S4^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#S4

Structured State Space for Sequence Modeling (S4) architecture, modeling dependencies inside sequences through the use of SSM layers.

has super-classes: Architecture ^c
has sub-classes: Mamba ^c

Self Attention Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#SelfAttentionLayer

An attention layer that perform attention using query, key, and values from the projections of the same sequence.

has super-classes: Attention Layer ^c
has members: Bert Encoder Attention Layer ⁿⁱ

Single Layer Perceptron^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#SingleLayerPerceptron

A single linear layer.

has super-classes: Multi Layer Perceptron ^c
is in domain of: is Transpose Layer ^op
is in range of: is Transpose Layer ^op
has members: BERT Encoder Block MLP Layer 1 ⁿⁱ, BERT Encoder Block MLP Layer 2 ⁿⁱ, BLOOM Desembedding layer ⁿⁱ, GPT Decoder MLP Layer 1 ⁿⁱ, GPT Decoder MLP Layer 1 ⁿⁱ, GPT Desembedding Layer ⁿⁱ, T5 Decoder MLP Layer 1 ⁿⁱ, T5 Decoder MLP Layer 2 ⁿⁱ, T5 Desembedding Layer ⁿⁱ

Speech^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Speech

has super-classes: Data Type ^c
has members: Speech ⁿⁱ
is also defined as: named individual

Supervised Training Task^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#SupervisedTrainingTask

A supervised training task consists in the association of a given input to a desired output.

has super-classes: Training Task ^c
is disjoint with: Unsupervised Training Task ^c

Task^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Task

A task consists of well defined objectif relating some input(s) to some output(s).

is equivalent to: (has Input Type ^op some Data Type ^c) and (has Output Type ^op some Data Type ^c)
has sub-classes: Language Processing Task ^c, Training Task ^c
is in domain of: has Input Type ^op, has Output Type ^op
is in range of: performs Task ^op

Text^c back to ToC or Class ToC

IRI: http://schema.org/Text

has super-classes: Data Type ^c
has members: Text ⁿⁱ

Token Embedding Layer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TokenEmbeddingLayer

An embedding layer that associates token inside a vocabulary to embeddings.

has super-classes: Embedding Layer ^c
has members: BLOOM Embedding Layer ⁿⁱ, GPT Embedding Layer ⁿⁱ

Tokenizer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Tokenizer

An algorithm that builds a vocabulary of tokens using a corpus. Tokenizers also exist for non textual data.

has super-classes: Algorithm ^c
is in range of: uses Tokenizer ^op
has members: Byte Pair Encoding Tokenizer ⁿⁱ, Byte Pair Encoding With Space Tokenizer ⁿⁱ, Sentence Piece ⁿⁱ

Training Task^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TrainingTask

A task used to train a model through the optimisation of a given objective function.

has super-classes: Task ^c
has sub-classes: Supervised Training Task ^c, Unsupervised Training Task ^c
is in range of: has Training Task ^op

Transformer^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Transformer

The architecture introduced in (Vaswani 2017), using multi-head attention on input tokens to model sequence depency, and stacking transformer blocks.

is equivalent to: Transformer Decoder Only ^c or Transformer Encoder Decoder ^c or Transformer Encoder Only ^c
has super-classes: Architecture ^c
has sub-classes: Transformer Decoder Only ^c, Transformer Encoder Decoder ^c, Transformer Encoder Only ^c

Transformer Block^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerBlock

A block used in the transformer architecture, using attention, layer normalization, and a multi layer perceptron.

is equivalent to: Transformer Decoder Block ^c or Transformer Encoder Block ^c
has super-classes: Module ^c
has sub-classes: Transformer Decoder Block ^c, Transformer Encoder Block ^c
is disjoint with: Embedding Layer ^c, Attention Layer ^c, Multi Layer Perceptron ^c, Normalization Layer ^c

Transformer Decoder Block^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerDecoderBlock

A transformer decoder block has a causal attention layer and a multi layer perceptron layer, and some normalization layer(s). it can also have a cross attention layer when part of an encoder decoder architecture.

is equivalent to: (uses Module ^op some Normalization Layer ^c) and (uses Module ^op only Causal Attention Layer ^c or Cross Attention Layer ^c or Multi Layer Perceptron ^c or Normalization Layer ^c) and (uses Module ^op exactly 1 Causal Attention Layer ^c) and (uses Module ^op exactly 1 Multi Layer Perceptron ^c) and (uses Module ^op max 1 Cross Attention Layer ^c)
has super-classes: Transformer Block ^c
has members: GPT Decoder Block ⁿⁱ

Transformer Decoder Only^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerDecoderOnly

A transformer using only a stack of decoder blocks.

is equivalent to: (uses Module ^op some Transformer Decoder Block ^c) and (uses Module ^op only Embedding Layer ^c or Multi Layer Perceptron ^c or Normalization Layer ^c or Position Embedding Layer ^c or Transformer Decoder Block ^c)
has super-classes: Transformer ^c
has members: BLOOM Architecture ⁿⁱ, GPT2 Architecture ⁿⁱ
is disjoint with: Transformer Encoder Decoder ^c, Transformer Encoder Only ^c

Transformer Encoder Block^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerEncoderBlock

A transformer encoder block has a self attention layer and a multi layer perceptron layer, and some normalization layer(s).

is equivalent to: (uses Module ^op some Normalization Layer ^c) and (uses Module ^op exactly 1 Multi Layer Perceptron ^c) and (uses Module ^op exactly 1 Self Attention Layer ^c)
has super-classes: Transformer Block ^c
has members: BERT Encoder Block ⁿⁱ, T5 Decoder Block ⁿⁱ

Transformer Encoder Decoder^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerEncoderDecoder

A transformer using an encoder stack and a decoder stack.

is equivalent to: ((uses Module ^op some Transformer Decoder Block ^c) and (uses Module ^op some Transformer Encoder Block ^c)) and (uses Module ^op only Embedding Layer ^c or Multi Layer Perceptron ^c or Normalization Layer ^c or Position Embedding Layer ^c or Transformer Decoder Block ^c or Transformer Encoder Block ^c)
has super-classes: Transformer ^c
has members: T5 ⁿⁱ
is disjoint with: Transformer Decoder Only ^c, Transformer Encoder Only ^c

Transformer Encoder Only^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerEncoderOnly

A transformer using only an encoder stack.

is equivalent to: (uses Module ^op some Transformer Encoder Block ^c) and (uses Module ^op only Embedding Layer ^c or Multi Layer Perceptron ^c or Normalization Layer ^c or Position Embedding Layer ^c or Transformer Encoder Block ^c)
has super-classes: Transformer ^c
is disjoint with: Transformer Decoder Only ^c, Transformer Encoder Decoder ^c

Unsupervised Training Task^c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#UnsupervisedTrainingTask

An unsupervised training task uses unlabeled data to define input output pairs a model should associate.

is equivalent to: Training Task ^c and (not (Supervised Training Task ^c))
has super-classes: Training Task ^c
has members: Masked Language Modeling ⁿⁱ, Next Word Prediction ⁿⁱ
is disjoint with: Supervised Training Task ^c

funded By^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#fundedBy

A research organisation is usually funded by some other organizations, which can be a corporations, but also governments or even other research organizations.

has domain: Research Organisation ^c
has range: Organisation ^c

has Architecture^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasArchitecture

A deep learning model has an architecture which corresponds to a set of modules that are linked together.

has characteristics: functional

has domain: Deep Learning Model ^c
has range: Architecture ^c

has Input Type^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasInputType

The input type of a task.

has domain: Task ^c
has range: Data Type ^c

has Output Type^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasOutputType

The output type of a task.

has domain: Task ^c
has range: Data Type ^c

has Training Task^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasTrainingTask

has domain: Machine Learning Model ^c
has range: Training Task ^c

is Transpose Layer^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#isTransposeLayer

Some linear layers have weights tied in such a way that the matrix representing the transformation associated with one is the transpose of the matrix associated to the other. This property is extended to embedding layers when they are seen as linear layers that send a one hot vector to the embedding of the associated item.

has characteristics: symmetric

has domain: Embedding Layer ^c or Single Layer Perceptron ^c
has range: Embedding Layer ^c or Single Layer Perceptron ^c

performs Task^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#performsTask

A model performs a task when it is able to reliably associate the input of the task to the output of the task.

has domain: Model ^c
has range: Task ^c

published By^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#publishedBy

A model is usually published by some organization.

has characteristics: functional

has domain: Model ^c
has range: Organisation ^c

uses Module^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#usesModule

An architecture or module can use module to perform some tasks.

has characteristics: asymmetric, irreflexive

has domain: Architecture ^c or Module ^c
has range: Module ^c

uses Tokenizer^op back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#usesTokenizer

Most language model need tokenization of the text input. (Not all : the Mamba architecture is able to operate at the byte level)

has characteristics: functional

has domain: Language Model ^c
has range: Tokenizer ^c

Data Properties

has Parameters
Published In
uses Causal Mask

has Parameters^dp back to ToC or Data Property ToC

IRI: https://edrohal.com/llmd#hasParameters

GPT 2 Full 1.5B parameter release, using the GPT architecture with a ten fold increase in parameters and training data compared to GPT 1

has domain: Model ^c
has range: int

Published In^dp back to ToC or Data Property ToC

IRI: https://edrohal.com/llmd#PublishedIn

has domain: Model ^c
has range: date Time

uses Causal Mask^dp back to ToC or Data Property ToC

IRI: https://edrohal.com/llmd#usesCausalMask

has characteristics: functional

has domain: Attention Layer ^c
has range: boolean

Named Individuals

Bert Encoder Attention Layer
BERT Encoder Block
Bert Encoder Block MLP
BERT Encoder Block MLP Layer 1
BERT Encoder Block MLP Layer 2
BERT Encoder Normalization Layer
BLOOM Architecture
BLOOM Decoder Block
BLOOM Decoder Block ALIBI LAYER
BLOOM Decoder Block Causal Attention Layer
BLOOM Desembedding layer
BLOOM Embedding Layer
Bloom Embedding Layer Normalization
BLOOM Model
Byte Pair Encoding Tokenizer
Byte Pair Encoding With Space Tokenizer
Google
Google AI
GPT Decoder Block
GPT Decoder Causal Attention
GPT Decoder MLP
GPT Decoder MLP Layer 1
GPT Decoder MLP Layer 1
GPT Decoder Normalization Layer
GPT Desembedding Layer
GPT Embedding Layer
GPT2 Absolute Position Embedding Layer
GPT2 Architecture
GPT2 Model
Hugging Face
Masked Language Modeling
Multi Task Fine Tunning
Next Word Prediction
Open A I
Sentence Piece
Speech
T5
T5 Decoder Block
T5 Decoder Causal Attention Layer
T5 Decoder Cross-Attention Layer
T5 Decoder MLP
T5 Decoder MLP Layer 1
T5 Decoder MLP Layer 2
T5 Decoder Normalization Layer
T5 Desembedding Layer
T5 Embedding Layer
T5 Relative Position Embedding
T5.11b.model
Text
Text Summarization
Text Translation

Bert Encoder Attention Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_ATTENTION_LAYER

The multi head self attention layer found in bert style encoder blocks.

belongs to: Self Attention Layer ^c

BERT Encoder Blockⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_BLOCK

The classical encoder block as implemented in BERT, and later other encoders like T5.

belongs to: Transformer Encoder Block ^c
has facts: uses Module ^op Bert Encoder Attention Layer ⁿⁱ; uses Module ^op Bert Encoder Block MLP ⁿⁱ; uses Module ^op BERT Encoder Normalization Layer ⁿⁱ

Bert Encoder Block MLPⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_MLP

The two layer perceptron used at the end of BERT style encoder blocks using ReLU activation.

belongs to: Multi Layer Perceptron ^c
has facts: uses Module ^op BERT Encoder Block MLP Layer 1 ⁿⁱ; uses Module ^op BERT Encoder Block MLP Layer 2 ⁿⁱ

BERT Encoder Block MLP Layer 1ⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_MLP_LAYER_1

The first linear layer of the two layered perceptron at the end of the Bert style encoder block.

belongs to: Single Layer Perceptron ^c

BERT Encoder Block MLP Layer 2ⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_MLP_LAYER_2

The second linear layer of the two layered perceptron at the end of the Bert style encoder block.

belongs to: Single Layer Perceptron ^c

BERT Encoder Normalization Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_NORMALIZATION_LAYER

The layer normalization layer used in the BERT encoder block.

belongs to: Normalization Layer ^c

BLOOM Architectureⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM

The Bloom architecture, a decoder only transformer which uses a layer normalization after the embedding layer and with a language modeling head (desembedding layer) which has weights tied with the embedding layer.

belongs to: Transformer Decoder Only ^c
has facts: uses Module ^op BLOOM Decoder Block ⁿⁱ; uses Module ^op BLOOM Desembedding layer ⁿⁱ; uses Module ^op BLOOM Embedding Layer ⁿⁱ; uses Module ^op Bloom Embedding Layer Normalization ⁿⁱ

BLOOM Decoder Blockⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DECODER_BLOCK

A block of the decoder stack in the BLOOM architecture. It uses the same architecture as GPT1, except for the additional ALIBI module inside the attention layer.

has facts: uses Module ^op BLOOM Decoder Block Causal Attention Layer ⁿⁱ; uses Module ^op GPT Decoder MLP ⁿⁱ; uses Module ^op GPT Decoder Normalization Layer ⁿⁱ

BLOOM Decoder Block ALIBI LAYERⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DECODER_BLOCK_ALIBI_LAYER

The ALIBI module used inside the BLOOM decoder block to tune the attention scores depending on relative positions of tokens.

belongs to: Position Embedding Layer ^c

BLOOM Decoder Block Causal Attention Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DECODER_BLOCK_CAUSAL_ATTENTION_LAYER

The causal attention layer used inside the BLOOM decoder block.

belongs to: Causal Attention Layer ^c
has facts: uses Module ^op BLOOM Decoder Block ALIBI LAYER ⁿⁱ; uses Causal Mask ^dp "true"^^boolean

BLOOM Desembedding layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DESEMBEDDING_LAYER

BLOOM's top of the stack linear layer that outputs logits corresponding to token distribution; its weights are tied with the embedding layer.

belongs to: Single Layer Perceptron ^c
has facts: is Transpose Layer ^op BLOOM Embedding Layer ⁿⁱ

BLOOM Embedding Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_EMBEDDING_LAYER

The embedding layer at the start of the BLOOM, associating tokens to vectors.

belongs to: Token Embedding Layer ^c
has facts: is Transpose Layer ^op BLOOM Desembedding layer ⁿⁱ

Bloom Embedding Layer Normalizationⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_EMBEDDING_LAYER_NORM

The layer normalization behind the embedding layer of BLOOM.

belongs to: Normalization Layer ^c

BLOOM Modelⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_MODEL

The BLOOM model trained and implemented by the big science initiative and published by HuggingFace. It is a multilingual LLM that was "trained in complete transparency" according to HuggingFace.

belongs to: Large Language Model ^c
has facts: has Architecture ^op BLOOM Architecture ⁿⁱ; has Training Task ^op Multi Task Fine Tunning ⁿⁱ; has Training Task ^op Next Word Prediction ⁿⁱ; performs Task ^op Next Word Prediction ⁿⁱ; published By ^op Hugging Face ⁿⁱ; uses Tokenizer ^op Byte Pair Encoding With Space Tokenizer ⁿⁱ; Published In ^dp "2022-07-06T00:00:00"^^date Time; has Parameters ^dp "17600000000000"^^integer

Byte Pair Encoding Tokenizerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BytePairEncodingTokenizer

The Byte Pair Encoding (BPE) algorithm creates a token vocabulary by greedily merging tokens, starting from characters.

belongs to: Tokenizer ^c

Byte Pair Encoding With Space Tokenizerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BytePairEncodingWithSpaceTokenizer

A specific implementation of the Byte Pair Encoding Algorithm that allow spaces to be part of tokens.

belongs to: Tokenizer ^c

Googleⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Google

The Google Corporation.

belongs to: Corporation ^c

Google AIⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Google_AI

The Google AI division of Google, focusing on research on the topic of AI.

belongs to: Research Organisation ^c
has facts: funded By ^op Google ⁿⁱ

GPT Decoder Blockⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_BLOCK

The transformer decoder block used in the GPT architecture, with only one attention layer and no cross attention layer since GPT is a decoder only transformer.

belongs to: Transformer Decoder Block ^c
has facts: uses Module ^op GPT Decoder Causal Attention ⁿⁱ; uses Module ^op GPT Decoder MLP ⁿⁱ; uses Module ^op GPT Decoder Normalization Layer ⁿⁱ

GPT Decoder Causal Attentionⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_CAUSAL_ATTENTION

The causal attention layer used in gpt decoder block.

belongs to: Causal Attention Layer ^c
has facts: uses Causal Mask ^dp "true"^^boolean

GPT Decoder MLPⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_MLP

The two layered perceptron used at the end of the decoder block of GPT.

belongs to: Multi Layer Perceptron ^c
has facts: uses Module ^op GPT Decoder MLP Layer 1 ⁿⁱ; uses Module ^op GPT Decoder MLP Layer 1 ⁿⁱ

GPT Decoder MLP Layer 1ⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_MLP_LAYER_1

The first layer of the two layer perceptron in the GPT decoder block.

belongs to: Single Layer Perceptron ^c

GPT Decoder MLP Layer 1ⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_MLP_LAYER_2

The second layer of the two layer perceptron in the GPT decoder block.

belongs to: Single Layer Perceptron ^c

GPT Decoder Normalization Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_NORMALIZATION_LAYER

The normalization used inside the GPT decoder block.

belongs to: Normalization Layer ^c

GPT Desembedding Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DESEMBEDDING_LAYER

GPT architecture top of the stack linear layer that outputs logits corresponding to token distribution; its weights are tied with the embedding layer.

belongs to: Single Layer Perceptron ^c
has facts: is Transpose Layer ^op GPT Embedding Layer ⁿⁱ

GPT Embedding Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_EMBEDDING_LAYER

The token embedding layer at the start of the GPT transformer architecture.

belongs to: Token Embedding Layer ^c
has facts: is Transpose Layer ^op GPT Desembedding Layer ⁿⁱ

GPT2 Absolute Position Embedding Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_ABSOLUTE_POSITION_EMBEDDING_LAYER

The absolute position embedding used in the GPT1 and GPT2 architecture use a parametrised module that learn embeddings of the input position.

belongs to: Position Embedding Layer ^c

GPT2 Architectureⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT2

The architecture of GPT2, which follows closely the GPT 1 architecture.

belongs to: Transformer Decoder Only ^c
has facts: uses Module ^op GPT2 Absolute Position Embedding Layer ⁿⁱ; uses Module ^op GPT Decoder Block ⁿⁱ; uses Module ^op GPT Desembedding Layer ⁿⁱ; uses Module ^op GPT Embedding Layer ⁿⁱ

GPT2 Modelⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT2_MODEL

The GPT2 model was trained by OpenAI using a similar approach as GPT1, scaling up by a tenth factor both parameters and training data. It also uses a different initialisation scheme for training.

belongs to: Large Language Model ^c
has facts: has Architecture ^op GPT2 Architecture ⁿⁱ; has Training Task ^op Next Word Prediction ⁿⁱ; performs Task ^op Next Word Prediction ⁿⁱ; published By ^op Open A I ⁿⁱ; uses Tokenizer ^op Byte Pair Encoding Tokenizer ⁿⁱ; Published In ^dp "2019-11-05T00:00:00"^^date Time; has Parameters ^dp "1500000000"^^integer

Hugging Faceⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#HuggingFace

The HuggingFace American / French company, known for its extensive transformer library with the implementation of most deep learning models.

belongs to: Corporation ^c

Masked Language Modelingⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#MaskedLanguageModeling

A unsupervised training task in which some input text tokens are masked and must be predicted from the context tokens.

belongs to: Language Processing Training Task ^c; Unsupervised Training Task ^c
has facts: has Input Type ^op Text ⁿⁱ

Multi Task Fine Tunningⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#MultiTaskFineTunning

A supervised training task consisting of learning in parallel several classical NLP tasks like translation, or finding the referent of a pronoun.

belongs to
has facts: has Input Type ^op Text ⁿⁱ; has Output Type ^op Text ⁿⁱ

Next Word Predictionⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#NextWordPrediction

The unsupervised task consisting in the prediction of the next word in a text sentence. It was popularized as a pretraining objective for transformers with GPT1 in the paper "Improving Language Understanding by Generative Pre-Training"(2018).

belongs to: Language Processing Training Task ^c; Unsupervised Training Task ^c
has facts: has Input Type ^op Text ⁿⁱ; has Output Type ^op Text ⁿⁱ

Open A Iⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#OpenAI

The American semi non profit organization that trained GPT models.

belongs to: Corporation ^c

Sentence Pieceⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#SentencePiece

The sentence piece algorithm, which tokenize text treated as sequence of unicode characters, making tokenization reversible contrarily to other methods like byte pair encoding.

belongs to: Tokenizer ^c

Speechⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Speech

belongs to: Speech ^c
is also defined as: class

T5ⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5

The architecture of the T5 encoder decoder LLM published by Google.

belongs to: Transformer Encoder Decoder ^c
has facts: uses Module ^op BERT Encoder Block ⁿⁱ; uses Module ^op T5 Decoder Block ⁿⁱ; uses Module ^op T5 Desembedding Layer ⁿⁱ; uses Module ^op T5 Embedding Layer ⁿⁱ

T5 Decoder Blockⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_BLOCK

The decoder block used in T5 : it is a classical transformer decoder block, the main difference being the presence of a cross attention layer.

belongs to: Transformer Encoder Block ^c
has facts: uses Module ^op T5 Decoder Causal Attention Layer ⁿⁱ; uses Module ^op T5 Decoder Cross-Attention Layer ⁿⁱ; uses Module ^op T5 Decoder MLP ⁿⁱ; uses Module ^op T5 Decoder Normalization Layer ⁿⁱ

T5 Decoder Causal Attention Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_CAUSAL_ATTENTION_LAYER

The first attention layer used in the T5 decoder block, performing multi head attention on the input tokens with a causal attention mask.

belongs to: Causal Attention Layer ^c
has facts: uses Module ^op T5 Relative Position Embedding ⁿⁱ; uses Causal Mask ^dp "true"^^boolean

T5 Decoder Cross-Attention Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_CROSSATTENTION_LAYER

The second attention layer in the T5 decoder block, performing cross attention with respect to the latent encoder representations.

belongs to: Cross Attention Layer ^c

T5 Decoder MLPⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_MLP

The two layer perceptron used at the end of the T5 style decoder block using ReLU activation.

belongs to: Multi Layer Perceptron ^c
has facts: uses Module ^op T5 Decoder MLP Layer 1 ⁿⁱ; uses Module ^op T5 Decoder MLP Layer 2 ⁿⁱ

T5 Decoder MLP Layer 1ⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_MLP_LAYER_1

The first linear layer of the two layered perceptron at the end of the T5 style encoder block.

belongs to: Single Layer Perceptron ^c

T5 Decoder MLP Layer 2ⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_MLP_LAYER_2

The first linear layer of the two layered perceptron at the end of the T5 style decoder block.

belongs to: Single Layer Perceptron ^c

T5 Decoder Normalization Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_NORMALIZATION_LAYER

The layer normalization used in the T5 decoder block.

belongs to: Normalization Layer ^c

T5 Desembedding Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DESEMBEDDING_LAYER

The linear layer performing projection onto vocabulary tokens at the end of the T5 architecture. In the T5 architecture, it shares its weights with the embedding layer .

belongs to: Single Layer Perceptron ^c
has facts: is Transpose Layer ^op T5 Embedding Layer ⁿⁱ

T5 Embedding Layerⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_EMBEDDING_LAYER

The embedding layer converting vocabulary tokens to embeddings at the beginning of the T5 architecture.

belongs to: Embedding Layer ^c
has facts: is Transpose Layer ^op T5 Desembedding Layer ⁿⁱ

T5 Relative Position Embeddingⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_RELATIVE_POSITION_EMBEDDING

A position embedding technic used in T5 that modifies the logits used inside the attention layer, adding a learned scalar that depends on the position offset between query and key.

belongs to: Position Embedding Layer ^c

T5.11b.modelⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5.11b.model

The 11B parameters version of the T5 model published by Google, trained on masked language modeling and a variety of supervised tasks.

belongs to: Large Language Model ^c
has facts: has Architecture ^op T5 ⁿⁱ; has Training Task ^op Masked Language Modeling ⁿⁱ; performs Task ^op Next Word Prediction ⁿⁱ; performs Task ^op Text Summarization ⁿⁱ; performs Task ^op Text Translation ⁿⁱ; published By ^op Google AI ⁿⁱ; uses Tokenizer ^op Sentence Piece ⁿⁱ; Published In ^dp "2019-10-23T00:00:00"^^date Time; has Parameters ^dp "11000000000"^^integer

Textⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Text

The data type of text contents.

belongs to: Text ^c

Text Summarizationⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#TextSummarization

A task consisting in summarization of text.

belongs to: Language Seq2 Seq Task ^c
has facts: has Input Type ^op Text ⁿⁱ; has Output Type ^op Text ⁿⁱ

Text Translationⁿⁱ back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#TextTranslation

A Task consisting of the association of a text input representing an information to a text output representing the same information in a different language or formalism. Example : French to German is a translation task.

belongs to: Language Seq2 Seq Task ^c
has facts: has Input Type ^op Text ⁿⁱ; has Output Type ^op Text ⁿⁱ

Legend back to ToC

^c: Classes
^op: Object Properties
^dp: Data Properties
ⁿⁱ: Named Individuals

References back to ToC

Add your references here. It is recommended to have them as a list.

Acknowledgments back to ToC

The authors would like to thank Silvio Peroni for developing LODE, a Live OWL Documentation Environment, which is used for representing the Cross Referencing Section of this document and Daniel Garijo for developing Widoco, the program used to create the template used in this documentation.

LLMD : Large Langage Model Design Ontology

Abstract

Introduction back to ToC

LLMD: Overview back to ToC

Classes

Object Properties

Data Properties

Named Individuals

LLMD: Description back to ToC

Cross-reference for LLMD classes, object properties and data properties back to ToC

Classes

Architecturec back to ToC or Class ToC

Attention Layerc back to ToC or Class ToC

Booleanc back to ToC or Class ToC

Causal Attention Layerc back to ToC or Class ToC

Corporationc back to ToC or Class ToC

Cross Attention Layerc back to ToC or Class ToC

Data Typec back to ToC or Class ToC

Deep Learning Modelc back to ToC or Class ToC

Embedding Layerc back to ToC or Class ToC

Imagec back to ToC or Class ToC

Language Modelc back to ToC or Class ToC

Language Processing Taskc back to ToC or Class ToC

Language Processing Training Taskc back to ToC or Class ToC

Language Seq2 Seq Taskc back to ToC or Class ToC

Large Language Modelc back to ToC or Class ToC

Machine Learning Modelc back to ToC or Class ToC

Mambac back to ToC or Class ToC

Modelc back to ToC or Class ToC

Modulec back to ToC or Class ToC

Multi Layer Perceptronc back to ToC or Class ToC

Normalization Layerc back to ToC or Class ToC

Position Embedding Layerc back to ToC or Class ToC

Research Organisationc back to ToC or Class ToC

S4c back to ToC or Class ToC

Self Attention Layerc back to ToC or Class ToC

Single Layer Perceptronc back to ToC or Class ToC

Speechc back to ToC or Class ToC

Supervised Training Taskc back to ToC or Class ToC

Taskc back to ToC or Class ToC

Textc back to ToC or Class ToC

Token Embedding Layerc back to ToC or Class ToC

Tokenizerc back to ToC or Class ToC

Training Taskc back to ToC or Class ToC

Transformerc back to ToC or Class ToC

Transformer Blockc back to ToC or Class ToC

Transformer Decoder Blockc back to ToC or Class ToC

Transformer Decoder Onlyc back to ToC or Class ToC

Transformer Encoder Blockc back to ToC or Class ToC

Transformer Encoder Decoderc back to ToC or Class ToC

Transformer Encoder Onlyc back to ToC or Class ToC

Unsupervised Training Taskc back to ToC or Class ToC

Object Properties

funded Byop back to ToC or Object Property ToC

has Architectureop back to ToC or Object Property ToC

has Input Typeop back to ToC or Object Property ToC

has Output Typeop back to ToC or Object Property ToC

has Training Taskop back to ToC or Object Property ToC

is Transpose Layerop back to ToC or Object Property ToC

performs Taskop back to ToC or Object Property ToC

published Byop back to ToC or Object Property ToC

uses Moduleop back to ToC or Object Property ToC

uses Tokenizerop back to ToC or Object Property ToC

Data Properties

has Parametersdp back to ToC or Data Property ToC

Published Indp back to ToC or Data Property ToC

uses Causal Maskdp back to ToC or Data Property ToC

Named Individuals

Bert Encoder Attention Layerni back to ToC or Named Individual ToC

BERT Encoder Blockni back to ToC or Named Individual ToC

Bert Encoder Block MLPni back to ToC or Named Individual ToC

BERT Encoder Block MLP Layer 1ni back to ToC or Named Individual ToC

BERT Encoder Block MLP Layer 2ni back to ToC or Named Individual ToC

BERT Encoder Normalization Layerni back to ToC or Named Individual ToC

BLOOM Architectureni back to ToC or Named Individual ToC

BLOOM Decoder Blockni back to ToC or Named Individual ToC

BLOOM Decoder Block ALIBI LAYERni back to ToC or Named Individual ToC

BLOOM Decoder Block Causal Attention Layerni back to ToC or Named Individual ToC

BLOOM Desembedding layerni back to ToC or Named Individual ToC

BLOOM Embedding Layerni back to ToC or Named Individual ToC

Architecture^c back to ToC or Class ToC

Attention Layer^c back to ToC or Class ToC

Boolean^c back to ToC or Class ToC

Causal Attention Layer^c back to ToC or Class ToC

Corporation^c back to ToC or Class ToC

Cross Attention Layer^c back to ToC or Class ToC

Data Type^c back to ToC or Class ToC

Deep Learning Model^c back to ToC or Class ToC

Embedding Layer^c back to ToC or Class ToC

Image^c back to ToC or Class ToC

Language Model^c back to ToC or Class ToC

Language Processing Task^c back to ToC or Class ToC

Language Processing Training Task^c back to ToC or Class ToC

Language Seq2 Seq Task^c back to ToC or Class ToC

Large Language Model^c back to ToC or Class ToC

Machine Learning Model^c back to ToC or Class ToC

Mamba^c back to ToC or Class ToC

Model^c back to ToC or Class ToC

Module^c back to ToC or Class ToC

Multi Layer Perceptron^c back to ToC or Class ToC

Normalization Layer^c back to ToC or Class ToC

Position Embedding Layer^c back to ToC or Class ToC

Research Organisation^c back to ToC or Class ToC

S4^c back to ToC or Class ToC

Self Attention Layer^c back to ToC or Class ToC

Single Layer Perceptron^c back to ToC or Class ToC

Speech^c back to ToC or Class ToC

Supervised Training Task^c back to ToC or Class ToC

Task^c back to ToC or Class ToC

Text^c back to ToC or Class ToC

Token Embedding Layer^c back to ToC or Class ToC

Tokenizer^c back to ToC or Class ToC

Training Task^c back to ToC or Class ToC

Transformer^c back to ToC or Class ToC

Transformer Block^c back to ToC or Class ToC

Transformer Decoder Block^c back to ToC or Class ToC

Transformer Decoder Only^c back to ToC or Class ToC

Transformer Encoder Block^c back to ToC or Class ToC

Transformer Encoder Decoder^c back to ToC or Class ToC

Transformer Encoder Only^c back to ToC or Class ToC

Unsupervised Training Task^c back to ToC or Class ToC

funded By^op back to ToC or Object Property ToC

has Architecture^op back to ToC or Object Property ToC

has Input Type^op back to ToC or Object Property ToC

has Output Type^op back to ToC or Object Property ToC

has Training Task^op back to ToC or Object Property ToC

is Transpose Layer^op back to ToC or Object Property ToC

performs Task^op back to ToC or Object Property ToC

published By^op back to ToC or Object Property ToC

uses Module^op back to ToC or Object Property ToC

uses Tokenizer^op back to ToC or Object Property ToC

has Parameters^dp back to ToC or Data Property ToC

Published In^dp back to ToC or Data Property ToC

uses Causal Mask^dp back to ToC or Data Property ToC

Bert Encoder Attention Layerⁿⁱ back to ToC or Named Individual ToC

BERT Encoder Blockⁿⁱ back to ToC or Named Individual ToC

Bert Encoder Block MLPⁿⁱ back to ToC or Named Individual ToC

BERT Encoder Block MLP Layer 1ⁿⁱ back to ToC or Named Individual ToC

BERT Encoder Block MLP Layer 2ⁿⁱ back to ToC or Named Individual ToC

BERT Encoder Normalization Layerⁿⁱ back to ToC or Named Individual ToC

BLOOM Architectureⁿⁱ back to ToC or Named Individual ToC

BLOOM Decoder Blockⁿⁱ back to ToC or Named Individual ToC

BLOOM Decoder Block ALIBI LAYERⁿⁱ back to ToC or Named Individual ToC

BLOOM Decoder Block Causal Attention Layerⁿⁱ back to ToC or Named Individual ToC

BLOOM Desembedding layerⁿⁱ back to ToC or Named Individual ToC

BLOOM Embedding Layerⁿⁱ back to ToC or Named Individual ToC

Bloom Embedding Layer Normalizationⁿⁱ back to ToC or Named Individual ToC

BLOOM Modelⁿⁱ back to ToC or Named Individual ToC

Byte Pair Encoding Tokenizerⁿⁱ back to ToC or Named Individual ToC

Byte Pair Encoding With Space Tokenizerⁿⁱ back to ToC or Named Individual ToC

Googleⁿⁱ back to ToC or Named Individual ToC

Google AIⁿⁱ back to ToC or Named Individual ToC

GPT Decoder Blockⁿⁱ back to ToC or Named Individual ToC

GPT Decoder Causal Attentionⁿⁱ back to ToC or Named Individual ToC

GPT Decoder MLPⁿⁱ back to ToC or Named Individual ToC

GPT Decoder MLP Layer 1ⁿⁱ back to ToC or Named Individual ToC

GPT Decoder MLP Layer 1ⁿⁱ back to ToC or Named Individual ToC

GPT Decoder Normalization Layerⁿⁱ back to ToC or Named Individual ToC

GPT Desembedding Layerⁿⁱ back to ToC or Named Individual ToC

GPT Embedding Layerⁿⁱ back to ToC or Named Individual ToC